Data management for data aggregation

ABSTRACT

The invention provides a method, system, and program product for managing data for data aggregation, including data mining and reporting.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of currently co-pending U.S. patentapplication Ser. No. 12/027,284, filed on Feb. 7, 2008, which isincorporated herein by reference in its entirety for all that itcontains in order to provide continuity of disclosure.

TECHNICAL FIELD

The invention relates generally to data management and, moreparticularly, to data management for data aggregation, including datamining and reporting.

BACKGROUND OF THE INVENTION

The use, operation, and maintenance of databases often involves datamining and reporting. These are often incompatible operations. Currentdata mining methods involve the duplication of a database's contents sothat the data can be mined without disabling or otherwise interferingwith the use of the production database. However, such methodsnecessarily employ data that is not current (i.e., because theduplicated database is being mined, changes in the production databasewill not be reflected until the database is reduplicated). Typically,the “lag” between the duplicated and production databases is betweeneight and 24 hours, an unacceptably long period in many instances.

Reporting methods suffer from similar deficiencies. For example, inorder to ensure that the report reflects the most current and accuratestate of the database, some reporting methods query the productiondatabase itself. This necessarily interferes with any concurrent use ofthe production database and may do so for several hours. In addition, inorder to avoid having to repeat such queries and their consequentinterference with the production database, the report output istypically stored outside the database itself. As a result, reportoutputs representing multiple states of the production database, none ofwhich may be current, may be available to a user. A further deficiencyin such a method is that the report output may be available toindividuals who otherwise may not have the permissions necessary toaccess the production database itself, thus creating a security threat.

Accordingly, there exists a need in the art to overcome the deficienciesand limitations described hereinabove.

SUMMARY OF THE INVENTION

The invention provides a method, system, and program product formanaging data for data aggregation, including data mining and reporting.

A first aspect of the invention provides a method of managing data fordata aggregation, the method comprising: determining the locations ofdata to be collected within a source database; acquiring at least oneaccess configuration log of the locations from which data will becollected; simultaneously collecting data from a plurality of thelocations; aggregating the collected data; normalizing the aggregateddata; storing the normalized data; and releasing the data in the sourcedatabase.

A second aspect of the invention provides a system for managing data fordata aggregation, the system comprising: a system for determining thelocations of data to be collected within a source database; a system foracquiring at least one access configuration log of the locations fromwhich data will be collected; a system for simultaneously collectingdata from a plurality of the locations; a system for aggregating thecollected data; a system for normalizing the aggregated data; a systemfor storing the normalized data; and a system for releasing the data inthe source database.

A third aspect of the invention provides a program product stored on acomputer-readable medium, which when executed, manages data for dataaggregation, the program product comprising: program code fordetermining the locations of data to be collected within a sourcedatabase; program code for acquiring at least one access configurationlog of the locations from which data will be collected; program code forsimultaneously collecting data from a plurality of the locations;program code for aggregating the collected data; program code fornormalizing the aggregated data; program code for storing the normalizeddata; and program code for releasing the data in the source database.

A fourth aspect of the invention provides a method for deploying anapplication for managing data for data aggregation, comprising:providing a computer infrastructure being operable to: determine thelocations of data to be collected within a source database; acquire atleast one access configuration log of the locations from which data willbe collected; simultaneously collect data from a plurality of thelocations; aggregate the collected data; normalize the aggregated data;store the normalized data; and release the data in the source database.

The illustrative aspects of the present invention are designed to solvethe problems herein described and other problems not discussed, whichare discoverable by a skilled artisan.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features of this invention will be more readilyunderstood from the following detailed description of the variousaspects of the invention taken in conjunction with the accompanyingdrawings that depict various embodiments of the invention, in which:

FIG. 1 shows a block and flow diagram of an illustrative methodaccording to an embodiment of the invention;

FIG. 2 shows a block and flow diagram of another illustrative methodaccording to an embodiment of the invention; and

FIG. 3 shows a block diagram of an illustrative system according to anembodiment of the invention.

It is noted that the drawings of the invention are not to scale. Thedrawings are intended to depict only typical aspects of the invention,and therefore should not be considered as limiting the scope of theinvention. In the drawings, like numbering represents like elementsbetween the drawings.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to the drawings, FIG. 1 shows a block and flow diagram ofan illustrative method according to an embodiment of the invention. AtA, the locations of data 102, 104 to be collected from database 100 aredetermined. At B, an access configuration log is acquired for at leastone of the locations determined at A. At C, data 102, 104 aresimultaneously collected from a plurality of locations within database100. Collecting data 102, 104 may optionally include buffering the dataat E. Such buffering may be based, for example, on a previouscollection, a current collection, and/or an upcoming collection.

At D, data 102, 104 are aggregated. Aggregating may include, forexample, an update such as overwriting old data in a previous collectionor inserting new data in a previous collection. Aggregating may alsoinclude constructing a data stream for the data collected at C, such asa comma separated value (CSV) data stream.

At F, the data aggregated at D are normalized. Normalizing data mayinclude any number of actions, such as compressing the aggregated data,converting the aggregated data to another format, or adding anencryption key to the aggregated data. In short, normalizing maycomprise any action or actions for placing the data in a form suitablefor subsequent use.

At G, the data normalized at F are stored. Such storage may be withindatabase 100 or on another storage medium. Prior to storing thenormalized data, the storage space necessary may be determined in orderto ensure that sufficient storage space exists. Finally, at H, thedatabase 100 is released, such that it is made available to other usersand/or systems. The method shown in FIG. 1 thus eliminates theduplication of the database 100, as well as its attendant lag incurrency.

FIG. 2 shows another block and flow diagram according to an alternativeembodiment of the invention. Here, one or more domain data objects 200are used. Domain data objects 200 are collections of industry-specificrules, procedures, formats, functions, and/or styles that dictate thespecifics of how the access configuration log is acquired (B′), and howdata is aggregated (D′), normalized (F′), and stored (G′).

For example, the type of data to be collected, as well as how it iscollected (e.g., in a particular format, utilizing a certain securitystandard, etc.) may vary depending on the particular industry involvedor use to which the data will be put. A banking transaction may requirethat the data be in a different format than would a retail salestransaction or an electronic communication. Thus, the method of FIG. 2,and particularly the use of one or more domain data objects 200, permitsthe more standardized collection and aggregation of data based on theultimate use to which the data will be put. Data may be collected andaggregated differently from the same database if a different domain dataobject 200 is used.

FIG. 3 shows an illustrative system 10 for managing data for dataaggregation. To this extent, system 10 includes a computerinfrastructure 12 that can perform the various process steps describedherein for managing data for data aggregation. In particular, computerinfrastructure 12 is shown including a computer system 14 that comprisesa data management system 40, which enables computer system 14 to managedata for data aggregation by performing the process steps of theinvention.

Computer system 14 is shown including a processing unit 20, a memory 22,an input/output (I/O) interface 26, and a bus 24. Further, computersystem 14 is shown in communication with external devices 28 and astorage system 30. As is known in the art, in general, processing unit20 executes computer program code, such as data management system 40,that is stored in memory 22 and/or storage system 30. While executingcomputer program code, processing unit 20 can read and/or write datafrom/to memory 22, storage system 30, and/or I/O interface 26. Bus 24provides a communication link between each of the components in computersystem 14. External devices 28 can comprise any device that enables auser (not shown) to interact with computer system 14 or any device thatenables computer system 14 to communicate with one or more othercomputer systems.

In any event, computer system 14 can comprise any general purposecomputing article of manufacture capable of executing computer programcode installed by a user (e.g., a personal computer, server, handhelddevice, etc.). However, it is understood that computer system 14 anddata management system 40 are only representative of various possiblecomputer systems that may perform the various process steps of theinvention. To this extent, in other embodiments, computer system 14 cancomprise any specific purpose computing article of manufacturecomprising hardware and/or computer program code for performing specificfunctions, any computing article of manufacture that comprises acombination of specific purpose and general purpose hardware/software,or the like. In each case, the program code and hardware can be createdusing standard programming and engineering techniques, respectively.

Similarly, computer infrastructure 12 is only illustrative of varioustypes of computer infrastructures for implementing the invention. Forexample, in one embodiment, computer infrastructure 12 comprises two ormore computer systems (e.g., a server cluster) that communicate over anytype of wired and/or wireless communications link, such as a network, ashared memory, or the like, to perform the various process steps of theinvention. When the communications link comprises a network, the networkcan comprise any combination of one or more types of networks (e.g., theInternet, a wide area network, a local area network, a virtual privatenetwork, etc.). Regardless, communications between the computer systemsmay utilize any combination of various types of transmission techniques.

As previously mentioned, data management system 40 enables computersystem 14 to manage data for data aggregation, including data mining andreporting. To this extent, data management system 40 is shown includinga location determining system 42, a configuration log acquiring systemsystem 44, a data collecting system 46, a data aggregating system 48, adata normalizing system 50, a storing system 52, and a data releasingsystem. Operation of each of these systems is discussed above. Datamanagement system 40 may further include other system components 56 toprovide additional or improved functionality to data management system40. It is understood that some of the various systems shown in FIG. 3can be implemented independently, combined, and/or stored in memory forone or more separate computer systems 14 that communicate over anetwork. Further, it is understood that some of the systems and/orfunctionality may not be implemented, or additional systems and/orfunctionality may be included as part of system 10.

While shown and described herein as a method and system for managingdata for data aggregation, it is understood that the invention furtherprovides various alternative embodiments. For example, in oneembodiment, the invention provides a computer-readable medium thatincludes computer program code to enable a computer infrastructure tomanage data for data aggregation. To this extent, the computer-readablemedium includes program code, such as data management system 40, thatimplements each of the various process steps of the invention. It isunderstood that the term “computer-readable medium” comprises one ormore of any type of physical embodiment of the program code. Inparticular, the computer-readable medium can comprise program codeembodied on one or more portable storage articles of manufacture (e.g.,a compact disc, a magnetic disk, a tape, etc.), on one or more datastorage portions of a computer system, such as memory 22 and/or storagesystem 30 (e.g., a fixed disk, a read-only memory, a random accessmemory, a cache memory, etc.), and/or as a data signal traveling over anetwork (e.g., during a wired/wireless electronic distribution of theprogram code).

In another embodiment, the invention provides a business method thatperforms the process steps of the invention on a subscription,advertising, and/or fee basis. That is, a service provider could offerto manage data for data aggregation, as described above. In this case,the service provider can create, maintain, support, etc., a computerinfrastructure, such as computer infrastructure 12, that performs theprocess steps of the invention for one or more customers. In return, theservice provider can receive payment from the customer(s) under asubscription and/or fee agreement and/or the service provider canreceive payment from the sale of advertising space to one or more thirdparties.

In still another embodiment, the invention provides a method ofgenerating a system for managing data for data aggregation. In thiscase, a computer infrastructure, such as computer infrastructure 12, canbe obtained (e.g., created, maintained, having made available to, etc.)and one or more systems for performing the process steps of theinvention can be obtained (e.g., created, purchased, used, modified,etc.) and deployed to the computer infrastructure. To this extent, thedeployment of each system can comprise one or more of (1) installingprogram code on a computer system, such as computer system 14, from acomputer-readable medium; (2) adding one or more computer systems to thecomputer infrastructure; and (3) incorporating and/or modifying one ormore existing systems of the computer infrastructure, to enable thecomputer infrastructure to perform the process steps of the invention.

As used herein, it is understood that the terms “program code” and“computer program code” are synonymous and mean any expression, in anylanguage, code or notation, of a set of instructions intended to cause acomputer system having an information processing capability to perform aparticular function either directly or after either or both of thefollowing: (a) conversion to another language, code or notation; and (b)reproduction in a different material form. To this extent, program codecan be embodied as one or more types of program products, such as anapplication/software program, component software/a library of functions,an operating system, a basic I/O system/driver for a particularcomputing and/or I/O device, and the like.

The foregoing description of various aspects of the invention has beenpresented for purposes of illustration and description. It is notintended to be exhaustive or to limit the invention to the precise formdisclosed, and obviously, many modifications and variations arepossible. Such modifications and variations that may be apparent to aperson skilled in the art are intended to be included within the scopeof the invention as defined by the accompanying claims.

What is claimed is:
 1. A method of managing data for data aggregation,the method comprising: determining a plurality of locations of data tobe collected within a source database; acquiring at least one accessconfiguration log of the plurality of locations from which data will becollected; simultaneously collecting data from the plurality of thelocations; aggregating the collected data; normalizing the aggregateddata; storing the normalized data; and releasing the data at each of theplurality of locations in the source database.
 2. The method of claim 1,wherein simultaneously collecting data includes buffering the collectionbased on at least one of the following: a previous collection; a currentcollection; or an upcoming collection.
 3. The method of claim 1, whereinaggregating includes constructing a comma separated value (CSV) datastream from the collected data.
 4. The method of claim 1, whereinaggregating includes at least one update selected from a groupconsisting of: overwriting old data in a previous collection andinserting new data in a previous collection.
 5. The method of claim 1,wherein normalizing includes at least one action selected from a groupconsisting of: compressing the aggregated data; or converting theaggregated data to another format.
 6. The method of claim 1, whereinstoring includes determining a size of the normalized data to be stored.7. A system for managing data for data aggregation, the systemcomprising: at least one computing device; a system for determining aplurality of locations of data to be collected within a source database;a system for acquiring at least one access configuration log of theplurality of locations from which data will be collected; a system forsimultaneously collecting data from the plurality of the locations; asystem for aggregating the collected data; a system for normalizing theaggregated data; a system for storing the normalized data; and a systemfor releasing the data at each of the plurality of locations in thesource database.
 8. The system of claim 7, wherein the system forsimultaneously collecting data includes a system for buffering thecollection based on at least one of the following: a previouscollection; a current collection; or an upcoming collection.
 9. Thesystem of claim 7, wherein the system for aggregating includes a systemfor constructing a comma separated value (CSV) data stream from thecollected data.
 10. The system of claim 7, wherein the system foraggregating is operable to perform at least one of the followingactions: overwrite old data in a previous collection and insert new datain a previous collection.
 11. The system of claim 7, wherein the systemfor normalizing is operable to perform at least one of the followingactions: compress the aggregated data; or convert the aggregated data toanother format.
 12. The system of claim 7, wherein the system forstoring includes a system for determining a size of the normalized datato be stored.
 13. A program product stored on a computer-readablestorage medium, which when executed, manages data for data aggregation,the program product comprising: program code for determining a pluralityof locations of data to be collected within a source database; programcode for acquiring at least one access configuration log of theplurality of locations from which data will be collected; program codefor simultaneously collecting data from the plurality of the locations;program code for aggregating the collected data; program code fornormalizing the aggregated data; program code for storing the normalizeddata; and program code for releasing the data at each of the pluralityof locations in the source database.
 14. The program product of claim13, wherein the program code for simultaneously collecting data includesprogram code for buffering the collection based on at least one of thefollowing: a previous collection; a current collection; or an upcomingcollection.
 15. The program product of claim 13, wherein the programcode for aggregating includes program code for constructing a commaseparated value (CSV) data stream from the collected data.
 16. Theprogram product of claim 13, wherein the program code for aggregatingincludes program code for at least one of the following: overwriting olddata in a previous collection and inserting new data in a previouscollection.
 17. The program product of claim 13, wherein the programcode for normalizing includes program code for at least one of thefollowing: compressing the aggregated data; or converting the aggregateddata to another format.
 18. A method for deploying an application formanaging data for data aggregation, comprising: providing a computerinfrastructure being operable to: determine a plurality of locations ofdata to be collected within a source database; acquire at least oneaccess configuration log of the plurality of locations from which datawill be collected; simultaneously collect data from the plurality of thelocations; aggregate the collected data; normalize the aggregated data;store the normalized data; and release the data at each of the pluralityof locations in the source database.
 19. The method of claim 18, whereinthe computer infrastructure is further operable to buffer the collecteddata based on at least one of the following: a previous collection; acurrent collection; or an upcoming collection.
 20. The method of claim18, wherein the computer infrastructure is further operable to performat least one action selected from a group consisting of: compressing theaggregated data; or converting the aggregated data to another format.